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ABSTRACT 

The University of California, Santa Cruz Genome 
Browser (http://genome.ucsc.edu) offers online 
access to a database of genomic sequence and 
annotation data for a wide variety of organisms. 
The Browser also has many tools for visualizing, 
comparing and analyzing both publicly available 
and user-generated genomic data sets, aligning 
sequences and uploading user data. Among the 
features released this year are a gene search tool 
and annotation track drag-reorder functionality 
as well as support for BAM and BigWig/BigBed file 
formats. New display enhancements include overlay 
of multiple wiggle tracks through use of transparent 
coloring, options for displaying transformed wiggle 
data, a 'mean+whiskers' windowing function for 
display of wiggle data at high zoom levels, and 
more color schemes for microarray data. New data 
highlights include seven new genome assemblies, a 
Neandertal genome data portal, phenotype and 
disease association data, a human RNA editing 
track, and a zebrafish Conservation track. We also 
describe updates to existing tracks. 

INTRODUCTION 

The University of California, Santa Cruz (UCSC) 
Genome Browser provides online access to sequence and 
annotation data for the human genome and those of 
several other species (1,2). The level of annotation 



differs among species, with recent assembhes of the human 
genome being the most richly annotated. The Genome 
Browser contains mapping and sequencing annotation 
tracks describing assembly, gap and GC percent details 
for all assembhes. Most organisms also have tracks con- 
taining alignments of RefSeq genes (3,4), mRNAs and 
ESTs from GenBank (5) as well as gene and gene predic- 
tion tracks such as Ensembl Genes (6). UCSC Genes, a 
gene prediction track generated at UCSC that is based on 
data from RefSeq, GenBank, CCDS and UniProt (2,7,8), 
is present for the most recent human and mouse 
assembhes. Most organisms also have comparative 
genomic tracks showing pairwise genomic alignments 
between assemblies. Roughly half the organisms hosted 
in the browser have a multiple sequence ahgnment track 
(multiz) (9). Expression, regulation, variation and pheno- 
type tracks are available for many organisms. Track de- 
scriptions can be accessed by clicking on a track item, the 
track title or the vertical bar to the left of the track in the 
image. Links to the corresponding locations on the NCBI 
Map Viewer (10) and Ensembl (6) genome browsers are 
also provided. 

UCSC hosts the Data Coordination Center for the 
Encyclopedia of DNA Elements (ENCODE) project, 
using the Genome Browser website as its primary data 
portal (11-13). Genome-wide production phase data 
were initially published on the hgl8 (NCBI build 36) 
human assembly and are currently being migrated to the 
hgl9 (Genome Reference Consortium GRCh37) browser. 
(For more detail see Raney et al. in this issue.) 

The Genome Browser includes many tools for 
visualizing and analyzing genomic data. Sequence data 
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can be retrieved via the 'Get DNA' utility, the Table 
Browser (14) or direct download (see below). The Table 
Browser also serves as a tool for retrieving and exploring 
Genome Browser data through filtering, intersecting and 
correlating the underlying database tables. Output from 
the Table Browser can also be sent to other tools such as 
Galaxy (15) or GREAT (see 'New features' section) for 
subsequent analysis. The BLAT (16) and in silico PCR 
tools ahgn sequences to genomes available in the 
browser. The LiftOver utihty, available as both a web 
interface (at http://genome.ucsc.edu/cgi-bin/hgLiftOver) 
and a command-hne executable program (from http:// 
hgdownload.cse.ucsc.ed/admin/exe/) translates genomic 
coordinates between assemblies. The Gene Sorter (17) 
allows users to explore the relationships between genes 
by comparing expression profiles, protein homology and 
other useful metrics of similarity. The Proteonie Browser 
displays protein properties and sequence data as tracks 
and histograms (18). VisiGene (19) is a searchable 
database of Xenopus and mouse in situ images showing 
cytology and expression patterns. Users can upload and 
view their data in the context of the other browser tracks 
using the custom tracks tool (8). Once uploaded, custom 
track data can be manipulated using any of the standard 
Genome Browser functionalities including the Table 
Browser. Track display configurations can also be saved 
and shared using the Sessions tool (8). Finally, Genome 
Graphs displays hosted tracks and user-generated custom 
tracks in the context of a genome-wide view. 

Bulk downloads of sequence and annotation data and 
Genome Browser source code can be found at http:// 
hgdownload.cse.ucsc.edu/. The source includes the 
browser bioinformatic command-Hne utilities (http:// 
genomewiki.ucsc.edu/index.php/Kent_source_utihties). 
Instructions for setting up a mirror are available at http:// 
genome.ucsc.edu/admin/mirror.html. Assemblies and data 
of interest can also be mirrored selectively (see http:// 
genomewiki.ucsc.edu/index.php/Minimal_Browser_ 
Installation). 

NEW FEATURES 

Gene search 

The gene search box takes a user directly to the UCSC/ 
Known Genes or RefSeq record associated with a gene of 
interest, bypassing the default search of the entire 
database (see Figure 1). After two or more characters 
are entered, the search box suggests gene names, and 
upon selection of a particular gene, the gene's coordinates 
appear in the position/search bar. In cases where the gene 
has several isoforms, the gene region is immediately dis- 
played rather than requiring the user to first select a par- 
ticular isoform, thus eliminating an extra navigation step. 

Drag-reorder 

Tracks within the browser image can now be reordered 
more easily by chcking on the label or vertical bar to the 
left of the track and dragging it to a new position within 
the image. If the track is a member of a composite track, 
hovering over the bar will cause the bars of all related 



subtracks to turn blue, making it easier to distinguish 
the reordered subtracks that belong to a single composite 
track. Tracks can be restored to their default order via a 
button below the track image (see Figure 2). 

BigBed/BigWig and BAM file support 

In late 2009 we introduced two new file formats for very 
large data sets, BigBed and BigWig (20), and have 
continued to add support for the display of these files as 
built-in tracks and custom tracks. BigWig and BigBed files 
are compressed binary indexed files containing data at 
several resolutions that allow the high-performance 
display of next-generation sequencing experiment results 
in the Genome Browser. A big advantage of these file 
formats is that only the portions of the files needed to 
display a particular region are transferred to UCSC, 
enabhng fast remote access to large distributed data sets. 

We have also introduced support for the Binary 
Ahgnment/Map (BAM) file format in custom tracks and 
in multi-view composite tracks. BAM is the compressed 
binary version of the Sequence Alignment/Map (SAM) 
format (21), a compact and indexable representation of 
nucleotide sequence ahgnments. BAM file format 
employs an architecture similar to BigWig/BigBed files 
and thus segments of the BAM file are transmitted as 
needed to display the current browser view, unlike PSL 
and other human-readable alignments formats. This 
makes it possible to load very large BAM files as custom 
tracks in situations where the file size would preclude 
upload in other file formats. BAM custom tracks 
enable the display of high-coverage sequencing read align- 
ments from the 1000 Genomes Project (http://www 
.1000genomes.org/), other sequencing projects, and the 
underlying data from which SNPs and CNVs were 
called. (See the Neandertal Sequence Reads track in 
Figure 2 for an example of the BAM track display.) 

Display enhancements 

We have augmented the Genome Browser's wiggle and 
microarray track display functionahties. Log-transformed 
wiggle data values may now be viewed in the browser, and 
we have also added a new windowing function for viewing 
wiggle data at zoomed-out levels. When a zoom-level 
is too large to show individual data values, the values 
must be combined to produce a plot point. With the 
'mean + whiskers' function, it is possible to simultaneously 
view the mean data value overlaid with measures of its 
central tendency. The mean appears in a dark shade, 
1 standard deviation around the mean in a medium 
shade, and the maximum/minimum in a light shade. 
Another new display feature is the transparent, multicol- 
ored overlay of multiple wiggles for some tracks (for an 
example, see Raney et al. in this issue). Standard and 
custom microarray tracks can be viewed in one of five 
combinations of red, green, blue, and yellow by selecting 
a scheme on the track details page (Figure 1) or by 
specifying an 'expColor' value in the custom track's 
settings (see http://genomewiki.cse.ucsc.edu/index.php/ 
Microarray_track#Microarray_Custom_Tracks). 
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Figure 1. Genome Browser display on the hgl9 human assembly showing the gene search box in use. After two or more characters are typed, the 
software suggests possible matching gene names. Tracks shown in this image, from top to bottom: base position, GNF Atlas 2 (with the red/blue on 
yellow background microarray color option enabled), SNPs (131) and Genome Variants (which includes five new datasets). 



Table Browser support for external software 

This year the Table Browser benefited from an addition 
that allows users to send genomic region data to the 
Genomic Regions Enrichment of Annotations Tool 
(GREAT) (22). Given a set of genomic regions, such as 
segments of DNA selected through ChlP-Seq experiments, 
GREAT analyzes the c/^-regulatory patterns in these 
regions and assesses their functional significance. 
GREAT users can also create UCSC custom tracks 
from these term-enriched subsets of genomic regions. 



NEW DATA 

We constantly add new annotation tracks and update 
existing tracks in the Genome Browser. Tracks that were 
released this year as a part of the ENCODE project are 
described in a separate publication. (See Raney et al. in 
this issue for a more information.) 

Neandertal data portal 

In May 2010 we released a group of tracks on the hgl8 
human browser and the panTro2 chimpanzee browser to 
accompany the initial publication of the Neandertal 
genome (23) (see Figure 2). Both the human and chimpan- 
zee browsers display ahgnments of Neandertal sequence 
reads and assembled contig sequences, and the human 
browser also offers human-chimp coding differences, a 
selective sweep scan (S) score, regions with the 5% 
lowest S score, SNPs used to calculate S score, and 



Neandertal mitochondrial sequence from a prior publica- 
tion (24). These tracks can be viewed in the human and 
chimpanzee browsers or accessed through the Neandertal 
portal page (http://genome.ucsc.edu/Neandertal/), which 
also provides hnks to download the associated tables 
and data files. 

Phenotype and disease association data 

In the past year we released two new human phenotype 
and disease association tracks. The first is based on 
DECIPHER, a database of submicroscopic chromosomal 
imbalances based on clinical information about chromo- 
somal microdeletions/duplications/insertions, transloca- 
tions and inversions (25). This track shows genomic 
regions of reported cases and their associated phenotype 
information. The second track displays SNPs from the 
Catalog of Published Genome- Wide Association Studies 
(http://www.genome.gov/gwastudies), a curated and regu- 
larly updated collection of SNPs identified by pubhshed 
studies attempting to assay at least 100000 SNPs (26). 

Human RNA editing 

We have added an RNA editing track on the human 
(hgl8) assembly based on the DARNED database (27), 
a catalog of RNA sequences that are edited after tran- 
scription, along with their corresponding genomic coord- 
inates. Only post-transcriptional editing that results in 
small changes to the identity of a nucleic acid are 
included in this track; it does not include other RNA 
processing such as splicing or methylation. The data 
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Figure 2. Genome Browser image on the hgl8 human assembly showing the UCSC Genes, Conservation and Neandertal tracks (Human-Chimp 
coding differences, regions with the 5% lowest S, SNPs used to calculate S and alignments of Neandertal sequence reads). The Vi33.6 sequence read 
subtrack highlighted in green is being vertically dragged to a new position. Note that hovering the mouse over any component subtrack causes the 
vertical bars to the left of all related subtracks to turn blue. 



were obtained from several research papers on RNA 
editing and were mapped to the human reference genome. 

New assemblies 

Since September 2009 we have updated the genome 
assembhes for marmoset, tetraodon, zebraflsh and cat. 
We have also added new browsers for pig, European 
rabbit, giant panda, African savannah elephant and 
Cahfornia sea hare. Each browser contains the baseline 
set of tracks and an additional complement of compara- 
tive genomic and other annotation tracks. 

New tracks in other organisms 

The chicken browser now features a track displaying the 
alignment of California condor (Gymnogyps calif ornianus) 
transcripts (sequenced using 454 high-throughput DNA 
sequencing) to the galGaB chicken genome. The condor 
read sequences were obtained from the NCBI Trace 
Archives (28). We have also released a Conservation 
track for the danRer6 zebraflsh assembly showing 
multiple alignments of six vertebrate species and measure- 
ments of evolutionary conservation using phastCons from 
the PHAST package. Conserved elements identifled by 



phastCons are displayed in the companion 'Most 
Conserved' track. 

Updates to existing tracks and assemblies 

Many Genome Browser tracks are updated regularly. The 
Database of Genomic Variants (DGV) (29,30) tracks, 
which detail genomic variants found among healthy 
human individuals, were updated to version 9 in the 
hgl7 and hgl8 assemblies and added to the hgl9 
assembly browser. The ORFeome Clones tracks, which 
show alignments of clones from the ORFeome 
Collaboration (31), were also updated for human 
assemblies. The human Genome Variants tracks were aug- 
mented to include Korean (SJK) (32) and 1000 Genomes 
high-coverage pilot individuals (NA12878, NA12891, 
NA12892 and NA19240) (Figure 1). 

A number of annotations present on the human 
assembly hgl8 were added to the new hgl9 assembly, 
most notably tracks showing UCSC Genes and conser- 
vation. New in hgl9 is the SNP track based on dbSNP 
build 131 (33) (Figure 1). The UCSC Genes track is 
a moderately conservative set of gene predictions based 
on data from RefSeq, Genbank, CCDS and UniProt. 
The Conservation track shows multiple ahgnments of 
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46 vertebrate species and measurements of evolutionary 
conservation using two methods (phastCons and phyloP) 
from the PHAST package for all vertebrate species, as well 
as primate and placental mammal subsets. The SNP track 
contains over 26 million mappings of more than 23 million 
reference SNPs that have been mapped to the reference 
genome by dbSNP. This represents a significant increase 
from the provisional hgl9 mappings of build 130 (33). As 
we continue to migrate the bulk of our hgl8 annotation 
tracks to the hgl9 assembly, we encourage our contribu- 
tors to submit hgl9-based data sets for inclusion in this 
effort. 

Tracks that are regularly updated on the mouse browsers 
include the International Gene Trap Consortium (IGTC) 
tracks (34) (updated monthly), the Mouse Genome 
Informatics MGI tracks (35), which show quantitative 
trait loci, phenotypes and alleles, and the IKMC Genes 
tracks (36), which show the genes targeted by the 
International Knockout Mouse Consortium for generating 
mouse embryonic stem cells containing a null mutation in 
every gene in the mouse genome. 

Some of our regularly updated tracks appear on 
multiple browsers. These include the Consensus Coding 
Sequence (CCDS) (37) tracks, which were updated on 
the human and mouse genomes, the Mammalian Gene 
Collection (MGC) tracks (38) on the human, mouse, rat, 
cow and frog genomes, and the Ensembl Genes tracks (6), 
available on approximately 25 different organisms. 
RefSeq and mRNA tracks, which display ahgned 
sequences from all organisms in GenBank (5), are 
updated nightly, and EST tracks are updated weekly. 



FUTURE DIRECTIONS 

We plan to incorporate several new features as well as 
exciting new variation and medical genomics data over 
the next year. We will also continue to add new and 
updated vertebrate and other selected model organism 
assemblies that have been deposited into GenBank. 
(Only assemblies registered and deposited at NCBI wiU 
be considered for hosting at UCSC, as stipulated in the 
Browser Genome Release Agreement instituted by NCBI, 
Ensembl, and UCSC.) 

By late 2010 we plan to release a utihty that enables 
users to quickly search track names and descriptions. 
This tool will provide both simple and advanced search 
interfaces, with the advanced interface allowing users to 
further refine their search criteria and search the metadata 
associated with ENCODE tracks (e.g. cell line, transcrip- 
tion factor, stage, etc.). Also by the end of 2010, users wiU 
be able to quickly access configuration and navigation 
shortcuts on the Genome Browser image by right-clicking 
on the vertical bar to the left of a track. 

We are developing data hub support that will make it 
possible to view user-supphed data (such as BigWig, 
BigBed and BAM files) with the more sophisticated 
track display options currently used on other UCSC 
tracks such as composite tracks. We are also working on 
several improvements to the display of BAM files such as 
filtering by flag and density-wiggle view. We plan to 



enable data extraction from BAM file-based tracks via 
the Table Browser. 

We anticipate adding a number of new variation tracks, 
including data from the 1000 Genomes project as well 
as from dbVar, a new structural variation database 
at NCBI. We are currently working on browser display 
support for data stored in Variant Call Format 
(VCF; http://1000genomes.org/wiki/doku.php7id = 1000_ 
genomes:analysis:vcf4.0), a format developed by the 
1000 Genomes Project to represent variant data. 
Additionally we are discussing strategies for distinguishing 
SNPs annotated as 'clinically associated' by dbSNP in our 
SNP annotations, and looking into other stratifications of 
these data such as singly mapped versus multiply mapped. 

We plan to import additional personal genome variant 
tracks from the Pennsylvania State University 
Bioinformatics Genome Browser (http: //main. genome- 
browser. bx.psu.edu/), including updated 1000 Genomes 
high-coverage trio variants as weU as variants from five 
Khoisan and Bantu genomes (39) and a Paleo-Eskimo 
Saqqaq genome (40). 

We intend to add medical genomics data from the 
International Standards for Cytogenomic Arrays (ISCA) 
consortium (41). These data should help chnicians inter- 
pret array CGH results by aggregating the results of 
potentially thousands of cases from clinics all over the 
world in one place. The data will be released to dbGaP 
and dbVar at NCBI and then integrated into the browser 
for display in the context of our other content. 

Finally, we plan to offer cloud support for mirrors, 
providing a Genome Browser image that will enable labs 
to instantiate a browser for private use without the 
overhead of a local server (42) and supporting the 
simple construction of new Genome Browsers on novel 
genome sequences. 



CONTACTING US 

We have two public, moderated maiUng fists for user 
support: genome@soe.ucsc.edu for general questions 
about the Genome Browser, and genome-mirror@soe 
.ucsc.edu for questions specific to the setup and mainten- 
ance of Genome Browser mirrors. Archives of both fists 
are searchable from our contacts page at http://genome 
.ucsc.edu/contacts.html. You may also reach us at 
genome-www@soe.ucsc.edu, the preferred address for 
inquiring about mirror site licenses and reporting server 
errors. Messages sent to this address are not archived in a 
publicly searchable location. 
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